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ABSTRACT 



A method for assessing tbc performance of a data classifier 
operable to generate an. element of output data in response 
to cleric at o^ input data, such as a ocural network, is 
disclosed The method includes using the data classifier to 
generate elements of result output data in response to 
elements of test input data, determining a measure of dif- 
ference between each element of test output data and each 
corresponding element of result output data, forming a 
attribution function of the measures of differences, and 
forming a measure of performance from the distribution 
function- 

23 Claims, 1 Drawing Sheet 
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PERFORMANCE AS SESSM ENT OF DATA 
CLASSIFIERS 

FIELD OF THE INVENTION s 

Tho present invention relates to methods and apparatus 
for assessing the performance of data, classifiers, such as 
neural networks. One specific field of application is that of 
training and nvrjwTnr* the performance of data claSSifLetS to 
be used for fraud detcctioo including, in particular, telecom- io 
muriications fraud, 

BACKGROUND to the invention 

Data classifiers such as neural networks typically operate 
by generating an element of output data in response to an JS 
element of input data. Such a data classifier may be con- 
structed or trained using a training set of input and output 
data elements in such a way that not only is the data classifier 
able to reproduce* as accurately as possible, each element of 
output training data in response to each corresponding 2Q 
element of input training data, but it is also able to generate 
suitable elements of oulput data in response to new input 
data dementi in a plausible and useful manner. Neural 
networks achieve this behaviour through the training of a 
plurality of interlinked neural nodes, usually constructed in ^ 
software „ buL other schemes are known. 

Data classifiers such as neural networks are commonly 
used in the detection of patterns or anomalies within largo 
data sets. A particular application is mat of detecting fr&udu~ 
lent activity on telecommunications networks, such as illicit 30 
emulation of a legitimate mobile telephone through cloning, 
tumbling or otherwise misusing a legitimate identification, 
code. 

An element of data for input to a data ebssificr may 
typically take the form of an input vector or similar data 35 
structure, Each input vector typically comprises a collection 
of parameters. In a lelocomnrraicatiQns. fraud detection 
system these may, mi example, rtlate to total call time, 
international call time and call frequency of a single tele- 
phone in a given lime interval Each input vector is associ- 40 
ated with an element of output data which may be as simple 
as a single parameter indicting the likelihood 01 ascertained 
fact that an input vector corresponds to fraudulent use of a 
telephone, or may itself udte the form of a vector. A trained . 
data classifier may then t« considered to define a mapping « 
between the input and output data elements, 

A data classifier trained or constructed od the basis of a 
training set of such corresponding elements of input and 
output data should be able to reproduce the output data, in 
response to the input data, to a reasonable degree of accu- 50 
racy. At the same time it will usually be important to 
maintain a good ability to respond in a suitable manner to 
new elements of input data, to retain sufficient flexibility to 
allow future retraining or adjustments in response to ocw 
training data and to minimise the time or other resources 55 
required in carrying out data classifier training or construc- 
tion. 

The balancing of these and Other pertinent training factors 
is frequently achieved, especially in the case of nuural 
networks, by use of a simple measure of difference between eo 
the "ideal"' output data elements, usually defined by the 
training data set, and the data elements output by the data 
Classifier in response to me input elements of the same data 
3C*. A commonly used measure of difference is the square 
root Of the mean of ihv sum Of these differences,, often t5 
referred to as the "rms-crror"' of the data classifier, or a 
related measure of difference. 
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As a data classifier undergoes training the rms-crror 
should reduce. It may be possible to reduce the cms-error to 
close to zero, but this is likely to lead to a data classifier that 
is very poor at generating reasonable output data elements in 
response to new input data elements, and thai 2s impervious 
to retraining. The training process, therefore, may be halted 
when the rms -error reaches a predetermined threshold. 

Alternatively, a subset of tho training data may be kept 
aside and used in a separate determination of rmS<rrO£ 
When this separate determination of rmS-errCr reaches a 
minimum and Starts to rise again, training is stopped, even 
though (be nns-crrar determined from the main body of 
training data would continue to fall "juis latter method, 
while generally robust, has a significant drawback in that a 
Sizeable proportion Of the available training data is not 
actually used for training the data classifier, and sucb early 
stopping methods in general have been shown to signifi- 
cantly inhibit the process of training data classifiers for use 
in fraud detection. 

The ability of a data classifier to identify patterns or 
characteristics in new input data differing considerably in 
magnitude or otherwise from the training data is particularly 
important for fraud detection. Particular scenarios of fraud 
identified within the training data may represent the most 
common fraud types, but variations on these scenarios may 
be wide ranging* and new methods and types of fraud are 
ltkcly to emerge from time to time which may be only 
loosely related or indeed unrelated to familiar scenarios. 

To some extent it is unrealistic to expect a data classifier 
such as a neural network to provide plausible outputs to new 
input data varying widely from the training data, but 
nevertheless, a significant degree of generalisation by a data 
classifier should be expected. 

OBJECTS OF THE INVENTION 

The present invention seeks to address the above men- 
tioned and other problems of the related prior art. In 
particular, the invention seeks to provide an improved 
method of assessing the performance of a data classifier, and 
an improved method of training a data classifier, as well as 
apparatus for carrying out such methods. 

SUMMARY OF THE INVENTION 

According to a first aspect of the invention there is 
provided a method of assessing tbc performance of a data 
classifier operable to generate an clement of output data in 
response to an element of input data, the method comprising 
the steps of: 

providing test data comprising elements of test input data 
and corresponding elements of test output data; 

operating the data classifier to generate elements of result 
output data in response to the elements of test input 
data; 

determining a measure of difference between each cle- 
ment of test output data and each corresponding ele- 
ment of result output data; 

forming a distribution function of said measures of dif- 
ference; and 

forming a measure of performance of the data classifier 
from said distribution function. 

The distribution function provides information on the way 
in which errors or mismatches between toe test output data 
and result output data are distributed. A given rms-crror 
based on the differences between a number of elements of 
"ideal" test output data and actual result output data may 
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result from a lessor number of large differences or a greater said elements of test input data also corresponding to 

number of small differences. Depending oo too practical us« elements of test output data; 

to wbid) tfae data classifi er is to be put, tbe laitw may be a difference generator operable todeteroioe a measure of 

sansfaaory, while rbe fcrmer may be unacceptable By differencs between each element of test output data and 

dwermuaagthe said o^ttibu^fimct^ameasuieof data j tech element of result output data; 

classifier rjerformance may be formed which 1$ peUcr tat- , % . , r , 

lorcd 10 a particular practical application. a Astribuuon function generator operable to form a dis- 

The tost data may comprise data used to train tbe data tfibWiOD function of Slid measures of difference; and 

classifier prior CO an asseiisinent of performance, or the test a performance measure generator operable to form a 

data may be independent of training data. measure of performance of the data classifier from said 

Preferably, the step of forming the distribution function distribution function, 

comprises the steps of citcgorisiDg the measures of differ- More generally, the invention provides apparatus operable 

ence into a plurality of categories and counting (be Dumber to carry out the steps of any of the methods of the invention, 

of measures of difference falling in each category. Tbe Apparatus embodiments of the mveotion may be itnpk- 

precise boundaries of such categories may not be important, m6mttd ^ software, hardware, or a combination of the two, 

but u may be desirable, for example, to set wtegories * foT c lc ^ onc ^ more &y6l£mfi c^p^ng 

reprr^rtve of unacceptable differences, acceptable dlf- metnory / ooc w ^ CCDtra , p rQce ^g units and suitable 

fcrences and neghgtbk dm^cmccs. The measure ^of. perfor- ^ d . , me chanisnis; Software may also be pro- 

mancc could then be formed to heavily penalise differences "5" 7~ ^^Jt^^wa^* tvJvlLluZ *7*J**r 

in the first category, but to ignore ffemnces in the third » 4 3 rtidabl * ^ m 10 ^ S of 

category. This may be carried %>ut, for exampk, by forming 20 «V«iW <tfJ^ 

a weighted sum of the number of measures of difference be provided on removable media, may be prft-inslaUed on 

falling in each catefiory. using a set of predefined weights. suitable computer systems, or may be transmitted as a 

Advantageously, these predefined weights may he chosen to signal. 

lend more Weight to lirgir measures Of difference than to Embodiments of Ibc invention will now be described, 

smaller measures of difference. # with reference to the accompanying drawing. 

Preferably, the above mentioned weighted sum is norma- 
lised using a factor related to the number of elements of test BRIEF .DESCRlKttON OF THE DRAWING 
mput da^ Tms may be carried out by drvidir^ the wd^ted nG . , fa A schematic of a system for assessing the 
^ 2 ! ^TZ^^f ^TS? t^Lt T« Performance of a data classiffer m which data^niS are 

input data. This formula Hot) has been round to relieve bias 

in the measure of perforraance against smaller sets of test DETAILED DESCRIPTION 
data. 

The lest data may comprise account fraud data, and in ^ A typical data classifier, such as a neural network for 

particular telecommunications account fraud data, detecting telecommunications account fraud, operates by 

Preferably, the data classifier comprises a neural network, genera ling elements of output data in response to elements 

In an alternative form of the method, the measure of of input data. While each element of input data is typically 

performance may be formed using a continuous, rather than a vector or other collection of independent parameters such 

a categorised attribution ninction. In another alternative, a as total call time, international call time and call frequency 

discrete or continuous weigh ting runctian is be applied to from a single telephone over a given time interval, each 

each measure of difference, and the measure of performance clement of output data is typically a single parameter, 

is then formed from the so weighted measures of difference. * Conveniently, this output parameter may range between 

According to a second aspect of the invention, a weight- zero, indicating DO fraudulent activity, to one, mailing 

ing function is applied directly to said measures of ^ definite fraudulent activity, with values in between mdicat- 

difference, and a measure of performance of the data cUs- £ng a probability or degree of confi d e n ce of fraudulent 

sifter is formed from the resulting weighted measures of activity, Consequently, a set of training data for training or 

difference, constructing such a data classifier will typically comprise * 

According to a third aspect of the invention, there is plurality Of different examples of input data vectors, and a 

provided a method of training a data classifier thai is ^ set 0 f corresponding output elements having values of either 

operable to generate output data in response to input data, one or zero, depending 00 Whether or not tbe associated 

the method comprising the steps o£ input data did, b fact, result from fraudulent activity. 

training tbe data classifier; Tbe classification characteristics of the data classifier may 

forming a measure of performance of tbe data classifier be assessed by providing test data comprising elements of 

using a method described herein; and $$ test input data and test output data, and operating tbe data 

optionally rUrammg m« datadassffl^ classifier to generate elements of result output data in 

measure of performance. response to the elements of test input data, Tbe differences 

ft>r example, the dau classifier could be repeatedly between tbe result output data and the test output data can 

retrained unul the measure of performance reached a thresh- then bo used to form a measure of pertbrrnancc of the data 

old value. Typically, retraining will be carried out only if h so classifier. Typically, the classifier may be tested using part or 

is expected to improve s« id measure of performance. aU of the uaining data. However, mis need not be the case 

Preferably, the data used for training and retraining and independent test data could be used, 

includes some or aU of tbe lest data, A number of categories may bo defined to group result 

According to 1 fourth aspect cf th« invention tbe*e is output data ©1© meets. Tbe examples here arc for result 

provided a data classifier system comprising: 65 output data elements having values ranging between zero 

a data classifier operable to generate elements of result and one, which correspond to test output data values of one, 

output data in response to elements of test Input dau, i.e. when tbe test input data is known in correspond to 
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fraudulent activity. Same reasonable categories for result 
output data values corresponding to test data output values 
of 1.0 axe shown in table 1. An M4, or "high classification", 
is used for result output data values greater than 0.95, an M3, 
or "medium classification" is used lor result output data 
values greater than 0.75 and up to 0.95, an M2, or "low 
classification" is used for result output data values greater 
than OS and up to 0.75, and an Ml, or 4 mis-cJassification" 
category is used for result output data values less than or 
equal to 03. 



TABLE 1 




Test OUlpUl 


Rcatili eurput 




Olt£» 


dots 


data 






element 


element 


Description 


Ml 


1.0 


x S 0.5 










ctenincatfoa 


M2 


1.0 


WaS ft.75 


taw* 










M3 


1.0 


0.75 < x S 0,95 


Tncdram* 










M4 


1.0 


0.95 <x 










classification 



Similar categories could be assigned for elements where 
the Lest output data is stero, i.e. not fraudulent, or oon- 
fraudulent examples could be combined into the same, or a 
similir category scheme. The calcgurisaiion scheme effec- 
tively assesses the distribution of differences between the 
Lest and result output data elements, and makes this distri- 
bution function available for further processing. 

The above categorisation scheme was applied to a neural 
network system trained using eight different input data sets 
of telecommunications fraud data, referred to a Nl, . . , N3. 
The results arc summarised in table 2, in which the eight 
rows correspond to the eight data sees. The rms-error of each 
resulting neural network, calculated from the- square root of 
the mean of the squares of the differences between each 
clement of test output data and each corresponding element 
of result output data, is shown in the Second column, and 
converted to a "neural network performance" R in the third 
column. Columns four, rive and six show the number of 
elements of result output data placed b each of categories 
Ml, M2 and M3 respectively, and the final column shows 
the total number of elements of test input data in each of the 
eight data sets. 



TABLE 2 



Data 
set 


nns- 
orror 


R 


Ml 


M2 


M5 


Tbtnl 


Nl 


0.056 


94.4 


0 


0 


30 


1429 


S2 


0.03727 


96.3 


2 


1 


7 




ro 


0.M557 


95.4 


0 


3 


47 


1477 


H4 


0.06705 


9C.J 


2 


0 


1 


mi 


N5 


0.01637 


98.4 


0 


0 


2 


506 


N6 


0.03703 


9«.3 


1 


0 


0 


J492 


N7 


0.42788 




a 


0 


0 


100 


Hi 


a 14116 


55.8 


2 


3 


2 


J47S 



The results of the abo^ categorisatioa process may be 
used to form a measure of data classifier performance which 
is nioro useful than a simple rms-error in many respects* To 
form such a measure of performance it is desirable to take 
account of the number Of result output data elements Calling 
in each category, to provide appropriate woigbis to these 
numbers, and to take account o£> or provide some normal- 
ization in respect of the total number of elements in the test 
data set 
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For assessing the performance of a data classifier trained 
to detect telecom mil nidations account fraud, it is appropriate 
to penalise mis-clasaific aliens heavily where the classifier 
does not recognise genuine fraudulent activity. An appro- 

5 priate set of weightings for the above categorisation scheme 
is, for example, to weight the number of result output data 
elements falling in category Ml by a multiple of 100, In M2 
by 10, in M3 by 1, and in M4 by zero. 
Weighting and sv mining the numbers of result output data 

to elements falling in each category yields a classification score 
inversely indicative of the performance of the data classifier, 
lb render this score into a useful number ranging from zero 
to ooc hundred, a realistic assessment of the number of input 
data elements that can acceptably be mis- classified needs to 

15 be made. A rough estimate for a teleoommunication fraud 
detection system is that 10% of the input data set being 
mis-classified is unacceptable, so this should correspond to 
a zero value of the performance Eneas me. The best perfor- 
mance is for all result output data elements to be classified 

20 in category M4. A suitable formula for such a measure of 
performance PI, implementing this scale, is given by: 

-Pl-100 cacp<-(ia0ml+iajn2-Hn5)Ai) 

where ml, ml and m3 are the number of result Output data 
25 elements falling io categories Ml, M2 and M3, and □ is the 
number of input or output data elements in the test data set 
The results of applying the above formula for Fl to the 
results of training and testing a neural networking using the 
eight data sets Nl-N$ of table 2 arc shown in tabic 3. 
30 Column 2, labelled "M_$ccre" lists the sum of the weighted 
counts of the categorised result output data elements 
(M^orc-lOQml+lOm2+m3). Column 3, labelled *R", 
lists the nns-crror based performance measure also shown in 
column 3 of FIG. 1. Column 4. labelled "Pi", lists the 
35 performance measure calculated using the formula for PI 
given above, and column 5 lists the average of columns 3 
and 4 for each test data set 



45 



TABLE 3 










Avenge 


Data cat 


R 


M_SCOft 


Pi 


R, W 


Ml 


94.4 


30 


07.9 


9&1 


N2 




ai? 


£5.6 


91.0 


N3 


95.4 


77 


94.9 


9S.2 


N4 


903 


201 


88.2 


89.3 


N5 


98.4 


2 


99,6 


99.0 


NS 


963 


100 


93-5 


95.4 


N7 


$12 


100 


36.8 


47.0 


NS 




t» 


S5.4 


85.6 



50 

The measure of performance given by the formula for PI 
seems to provide a reasonable assessment of classification 
peribrmanco for neural networks trained to identify fraud in 
telecommunications account data. The result for teat data set 

55 N7 shown in table 3 is probably Unduly harsh, representing 
a single mis-classification or from a data set of L00 
examples. 'Jfte same circumstances for data set N6 having 
1492 examples gives a reasonably high score. It would seem 
that some sort of compensation for data set size is needed. 

60 The measure given by PI penalises mis-cla&sifkations, 
which is appropriate. For neural networks detecting fraud, a 
higher rmverror and no mis<lassification is generally pref- 
erable to a lower rms-cxror with one mis-classificadon. 
It is interesting to compare the PI column of table 3 with 

45 the nns-crror based R column. The result output data ele- 
ments generated by neural networks trained and tested using 
data sets Nl, N3 and N5 contained no mis-classifications. In 
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two of these casts. (b6 result is a PI measure of performance 
higher Chan the cms- error based measure K, The only large 
difBereoces between the two scoring schemes arise from data 
sets N2 and N7. N7 t a smdl data set* is discussed above. The 
rms-error based measure R arising from tfaforrjg cod testing 5 
using N2 is surprisingly Urge considering that tbe result 
output data coDiaiced two nus-class locations. The low PI 
score for N2 appears to reflect over filling by the neural 
network to the remaining data. 

As mentioned above, the measure of performance given io 
by the formula for PI is biased against smaller sets of test 
data. This bias becomes extreme for very small data sets. 
Data sets used far training neural networks to detect tele- 
communications account fraud may typically comprise 
about 500 to 1500 input or output data elements. The is 
following formula for a correction factor CI may be used to 
substitute for o in the above formula for perform nice 
measure Pl» to compensate for variations in data set size 
over the range 500-15 00. 

20 

where n is die number of i trout or output data elements in the 
tost data set. Performance measure PI corrected using CI 
will be denoted P2, Using correction factor CI yields a- 
performance measure P2 which is still rather biased, yield- 
ing unduly low values for small test data sets. 

Table 4 is similar to Cable 3, but with an added column 
showing the CI corrected performance measure P2 for 
neural networks trained aod tested using data sets N1-N8. yj 
The last column of the table snows the average of the 
rms-error based measure R and the corrected performance 
measure P2 for each data set. 

TABLE 4 3s 



TABLE $-continucd 



TVm f£t 


R 




F? 


Average 
R. PI 


N5 


93.4 


2 


99.7 


99.1 


K6 




TOO 


92.8 


94.0 


K7 


57.2 


300 


Ciaj 


«X7 


N8 


854 


232 


83.9 


34.9 



Dbul 










Average 


vet 


ft 


M^Scoro 




Pi 


R, PI 


M 


94,4 


30 


97.9 


97,9 


90.2 


m 


96*3 


117 


SS.6 


S3.0 


90.7 


NJ 


95JL 


T7 


94.9 


9C.6 


95.2 


n* 


903 


2131 


682 


87.5 


B3.P 


NS 


98,4 




99.5 


99.6 


99j0 


Nti 




UJO 


93.5 


93.2 


9<LS 


N7 


57.2 


ii)D 


364 


51.3 


S4J 


NS 


SS.S 


232 


S5.4 


W.7 


S*J 
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A stronger, perhaps more appropriate correction factor for 
teleco mm tini cations account fraud data sets of about 
500-1000 elements is given by; 



Table 5 shows, the results of substituting the correction factor 
G2 in place of n in the formula for P2, to yield a further 
revised performance measure P3 . Table 5 is similar to tabks 
3 and 4, having eight rov/s corresponding to the results of 
training and testing a noufid network using the eight data sets . 
N1-N8. 



50 



TABLES 










Avcrjgo 


Data sr 


R 




P3 


a, pi 


Nl 




30 


97.7 


PK.4 


K2 


9S.3 


237 


SO 


903 


N3 


95.4 


77 


9*3 


94^ 


H* 




2m 


86.7 


8SL5 



$0 



Performance measure P3 is therefore given by the for- 
mula: 

This measure provides a performance measure that is rea- 
sonable over a wide range of sizes of test data sets. The &Ue 
correction is based on the premise that the performance 
measure is intended for use with test data sets containing 
about 1000 input or output dements. Tbo correction factor 
adjusts I be performance measure SO that the weighting for 
npis-classificalioiis is based on the above discussed bench- 
mark of 10% mis-classifications yielding a measure of zero. 

In addition to increasing the performance measure for 
small data sets, the correction factor C2 reduces the mag- 
nitude of the performance measure for data sets having more 
than 1000 elements. However, this effect is not large for 
reasonably sized data sets of up to a Cew thousand elements. 
This effort, moreover, does not afreet the manotonic behav- 
iour of P2, namely, as the data set size increases so does the 
performance measure. 

Numerical experiments carried out with genuine telecom- 
munications account fraud data show that in all case* where 
a neural network mis-classifies one or more input elements 
of test data, the performance measure P3 is wen below the 
rms-error based measure R, and that the performance mea- 
sure drops rapidly with further mis<lassiflcarions. This is 
desirable behaviour for a neural network used for detecting 
account fraud. Conversely, for the two data sets Nl and N3 
above for which the trained neural network did not mis- 
classify any of the input data elements, the performance 
measure was higher than the rms-error based measure R. 
Again, this is desirable behaviour for fraud detection sys- 
tems. 

Referring now to FIG. 1 there is shown a schematic 
diagram of a data classifier with associated apparatus and 
data Structures for generating a measure Df performance of 
the data classifier. Toe illustrated arrangement may be 
operated according to any of thu methods described above. 
Data units are shown as rectangles, functional units as 
truncated rectangles and data flows as arrows. 

Test data 10 comprises elements of test input data 12 and 
corresponding elements of test output data 14. Elements of 
the test input data 12 are passed to a data classifier 16. The 
data classifier 14 may typically have been trained using part 
or all of the test data 10, or may be in the process of being 
so trained. 

The data classifier 16 generates an element of result 
output data in response to each clement of teat input data* 
and passes these clement of result output data to a difference 
generator 18. The difference generator 18 compares each 
6b men t Of result output data with each corresponding ele- 
ment of test output data 14 and forms therefrom a measure 
of djtference. If the output elements arc scalar values then 
the measures of difference may be formed by a simple 
subtraction. The measures of difference indicate to what 
extent the data classifier is failing to reproduce the appro- 
priate test data output ekmeots. 
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The measures of difference are passed to a distribution What is claimed is: 

function generator 20, which may conveniently operate X- A method of assessing the pcrfonnancc of a data 

according to the method described above by classifying the clasfiincr operable to generate an clement of output data to 

measures of difference into a number of categories 22, and response to an clement of input data, the method comprising: 

counting the number of measures of difference so calcgpr- 5 providing test ditt comprising elements of tart input data 

ised into each category. Categories M1-M4 described and corresponding elements of test output data; 

above, and snmmariscd in table 1. are representative of operating the data classifier to generate elements of result 

"mis^lassin^tioDs" (Urge measures of difference), "low- mpax # m iQ t0 ^ clMX]tcnte of mi ^ 
classifications (moderate measures of difference), 

-medium-claswncatiQns" («nUl measures ofdi'^oce) and 10 ^ rTm{ ^ a mCaSurt of difference between each ele- 

-mgh-classifica&ons 0 (ncgligfcle measures of difference). ~T ™T ™ r ^J%Z *j! 

Ths distribution functic^nemtor 20 generates a distri- m(a * °* Ccsl 1 OUIp ^ .!\ 0316 cacb mncs P°^ m ^ clc " 

luo ^«iB«ui«uiuavu«u fecuwaiu* wt,B B wam 17™. m61u of TOSUlt OUtpUt data; 

butino function of the measures of difference, which is c _ . _. . _ . * _ "7 . . _ 

passed to a performance measure generator 24. The perfor- » disUibutmn taction of said measures of dif- 

mance measure generator is adapted to form performance is ntrence; and 

measure 26, for example in accordance with one of the generating a measure oJpecfonmancc of the data classifier 

equations for performaiK* measure PI, P2 or P3 given fr°m said distribution function, 

above. The distribution function in these equations is rep- 2 - A method according to claim 1 wherein forming a 

resented by parArneJen ml, tn2 and m3 which are the distribution function comprises: 

number of measures of difference falling into each category 20 categorizing lho measures of difference into a plurality of 

Ml, M2 and M3. In forming the measure of performance, categories; and 

the distribution function is weighted according lo a set of counting the numbers of measures of difference in each 

category weights 26, shown as wl— w4 in FIG. 1. In the category. 

method described abovo, wl-100, W2-1Q, w3-l and w4-0. 3. A method according to claim 2 wherein forming a 

If the data classifier is in the process of being trained, the 25 measure of performance comprises forming a weighted sum 

performance measure 26 may be used to assess whether of the number of measures of difference in each category 

further training is required, for example by reference to a using predefined weights. 

threshold. 4. A method according to claim 3 wherein the predefined 

A number of variations to the described embodiments will weights are chosen to provide more weigh! to larger mea- 

now be discussed. Although a classification scheme using 30 sures of difference; 

four categories has been described, using particular weight- 5. A method according to claim 3 wherein forming a 

ing factors, other distribution functions af the differences measure of performance further comprises normalizing the 

between ideal and actual data classifier output data could weighted sum using a factor related to the number of 

equally be used, along with any weighting scheme suitable elements of test input data. 

for the particular application at band. Indeed, in other $$ 6. A method according to claim 5 wherein normalizing 

embodiments of the invention, a weighting function is comprises dividing the weighted sum by a factor comprising 

applied directly to the measures of difference between the the number of elements of test input data, 

ideal and actual output data. It will also be apparent that one 7. A method according co claim 5 wherein normalizing 

or more continuous functions could be used in plate of the comprises dividing the weighted sum by a factor comprising 

discrete categorisation described. 40 the reciprocal of a logarithm of the number of elements of 

The test output data described in ooortectico with the test input data, 

embodiments described above has two values: "one" iodi- $. A method according to claim 1 wherein the test data 

cares confirmed fraud, and "zero" indicates con finned no comprises telecommunications account fraud data, 

fraud. However, other output, and indeed Input data types 9. A method according to claim X wherein the data 

may be used. The elements of test output data, for example, 45 classifier comprises a neural network, 

could comprise real rather than discrete values! or other data 10. A method of assessing the performance of a data 

types such as vectors, as long as a suitable measure of classifier operable to generate an element of output data in 

difference between the test output data and result output data response to a a clement of input data, the method comprising; 

can be used. providing test data comprising elements of test input data 

The embodiments have been described in reaped of so and corresponding elements of test output data; 

neural networks trained ;md tested using tclccoromurrica- operating the data classifier to generate elements of result 

u'ons account fraud data. Clearly, the invention is afeo 01Uput dm £n ^onsc to elements of the test input 
applicable to the training and testing of other types of data 

classifier, and W data cbssifiera and data classifier systems d6terauning a mC asuxc of difference between each clc- 

adapted for other purposes. ss mtQt f tefit 0 dAta and cach corresponding ele- 

The performance measures described may typically be ^ of * wt data; 

implemented in software on suitable computer systems, . , . ' . _ - 

w4h typically win ate host tbc subject data classifier » fcncDOD to said nana > of «Wfc 

software ence to form weighted measures of d inference; and 

Aparticuiar use of performance measures according to the 60 generating a measure of performance of the data classifier 

invention is in the training of data classifiers. At each stage from said weighted measures of difference, 

of training, such a perfoiTnancc measure may be used to U» A method of training a data classifier operable to 

assess the progress of the training which may. for example, generate output data in response lo inpnt data, the method 

be halted when ih& performance measure reaches a prede- comprising: 

tcrmined threshold. Performance measures according to the &s training the data classifier; 

irjvcritiott may also be used, for example, to compare two or providing test data comprising elements of test input data 

more different data classifier*. and corresponding elements of test output data; 
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operating the data classifier to generate elements of result 
output data In response to the test input data; 

determining a measure of difference between each ele- 
ment of test output daia and each corresponding cle- 
ment of result output data; 3 

forming a distribution function of said measures of dif- 
ference; 

generating a measure of performance of the data classifier 



from said distribution function; and 



JO 



optionally retraining the data classifier in response to said 
measure of performance. 

12. A method according to claim U wherein optionally 
retraining the data classifier is carried out if said retraining 

is expected to improve said measure of performance. j$ 

13. A method according to claim 11 wherein at least some 
of said test data is used in said training. 

14. A method according to claim U wherein at least some 
of said te&t data is used in said optionally retraining. 

15. A data classifier syacm comprising; 20 

a data classifier operable to generate elements of result 
output data in response to elements of test input data, 
said elements of lest input data also corresponding to 
elements of test output data; 

a difference generator operable to determine a measure of 25 
difference between each element of lest output data and 
each corresponding clement of result output data; 

a distribution function generator Operable to form a dis- 
tribution function of said measures of difference; and _ 

a peifbrmance measure generator Operable to generate a 
measure of performance of the data classifier from said 
distribution function. 

16. A system according to claim 15 wherein the distribu- 
tion function generator is further operable to: M 

categorize the measures of difference into a plurality of 

categories; and to 
count the number of measures of difference in each 

category. 



17. A system according to claim 16 wherein the peifbr- 
mance measure generator is further operable to form a 
weighted sum of the number of measure* of difference in 
each category using predefined weights- 

l£. A system according to claim 17 wherein the pre- 
defined weights provide more weight to larger measures of 
difference. 

19. A system according to claim 17 wherein the perfor- 
mance measure generator is further operable to normalize 
the weighted sum using a factor related to the number of 
elements of test input data. 

20. A system according to claim 19 wherein tbe perfor- 
mance measure generator, in carrying out tbe normalization, 
is further operable to divide the weighted sum by a factor 
comprising the number of elements of test input data. 

21. A system according to claim 19 wherein the perfor- 
mance measure generator, in carrying out the normalization* 
is further operable to divide the weighted sum by a factor 
comprising the reciprocal of a logarithm of the number of 
elements of test input data. 

22. A system according to claim X5 wherein the data 
classifier comprises a neural network. 

23. Computer software in a machine readable rocdjuro For 
providing it least part of a data classifier system when 
executed on a computer system, the software being operable 
to; 

receive test data comprising elements of test input data 
and corresponding elements of test output data; 

receive elements of result-output data generated by a data 
classifier in response to said test input data; 

determine a measure of difference between each element 
of said test output data and each corresponding clement 
of said result output dita; 

form a distribution function of said measures of differ- 
ence; and 

generate a measure of performance of the data classifier 
from said distribution function. 
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